CNS Abstract

In this project I used Princple Component Analysis to examine the degree of separability between modeled neuron electrical recordings and real electrical recordings from actual neurons.

If biologically realistic models were better at imitating real experimental cells, then data and models would not easily be discriminable. By plotting a 48 dimensional feature space onto a two dimensional projection space, I show that a diverse pool of data and models are readily discriminated via Random Forest Classification, a result, that leaves even some of the most optimized models lacking. The idea is that the models which are the most resistant to being correctly machine-classified as models (therefore being misclassified as data), serve as better imitations/mimics of experimental data. I also used random forest regression to investigate when experimental data inform a classifying statistical model which dimensions explain the most of the observed variance in the feature space. Variance-explained will facilitate the production of a list of improvements to make to our models in order to render models better imitations of real data.

In this project you can see use of:

  • PCA, t-Distributed Stochastic Neighbor Embedding (t-SNE).
  • Random Forest Classification (RFC) using 38 features, and also RFC applied to just 2 features (output from PCA).
  • using the RFC "variance-explained" feature.
  • Plotting of a decision boundary. (need to redo).
  • Not done yet, but pending Cross-Validation using looping over many different test/train splits.

Broader Project Context and Background:

There is a great diversity of real biological neurons, all of which differ substantially in their electrical behavior. There are a few different classes of general purpose neuronal models, that can reproduce these different types of electrical behaviours, given appropriate parameterizations of the models.

An exisiting class of neuron model type, called The Izhikevich model was published with parameter sets believed to make the model outputs accurately align with a variety of real biological cell outputs. However since publication much very specific electro physiological recordings have accumulated, that in someways undermine model/experiment agreement. However it is now possible to constrain the Izhikevich model and find new parameterizations that more allow us to more accurately reproduce more recently published experimental data.

In contrast to other projects that seek to use features to seperate and classify two different categories of things that are hard to tell apart, such that humans can benefit from a fast classification of hard to discern differences in high dimensional spaces. In this project the goal is to use resistance to classification as an indicator of an optimization algorithms success, and to use machine seperation of data categories as an error signal, that directs us to precise locations of model failure. Another way of saying this, is, if a good/fair attempt at machine classification is hard, then then a different machine learning algorithm did a good job. If machine classification is very easy, the optimization algorithm did a poor job.

Code authorship.

I used the approach described herein for different research work intended for a conference abstract published as follows: J Birgiolas, R Jarvis, V Haynes, R Gerkin, SM Crook (2019) Automated assessment and comparison of cortical neuron models BMC Neuroscience 2019, 20(Suppl 1):P47

The application of TSNE to data was developed in a research team context on different data pertaining to ion channels, or the APs exclusively derived from models (as opposed to a combination of models and data). In the context of this project, I have used novel experimental data (pulled from the Allen Brain Portal API) and novel models (8 optimized cell models included), so I have re-applied a small amount of code from pre-established work, but I have made substantial novel contributions, by looking at different features, applying different feature engineering, applying Random Forest Classification, applying variance explained, and interpreting results. For a comparison to other pre-established work that informed this work check here

Model Optimization as a data pre-processing stage.

Before Machine Learning and analysis techniques could be applied, we needed to find optimized models. These optimized models can be understood as models that are intended to be superior mimics of real biologically derived data, as their governing equation parameters have been more rigorously constrained by a wider range of experimental data.

In order illustrate that the optimized models are better imitations of real data, four adaptive Exponential models, and four Izhikevich models each were fitted to four different classes of experimental cells see implementation in ipython notebook Notebook. These eight fitted models were subsequently fed into a Druckman feature extraction algorithm, and added as data points in a dimension reduced plot of the feature space. Many pre-existing neural models, and some Allen Brain Data where also plotted as contextual data in the same feature space.

Project Implementation and Technologies

  • Python, pandas sklearn, dask were all used for Model Optimization pre-processing steps, and for plotting the models in a dimension reduced feature space.
  • Models versus Data. Models which are resistant to being classified as models are more successful, and better representatives of data. See below.
  • The optimized cells were derived from a custom built parallel genetic algorithm, utilizing pre-existing python tools: DEAP and Dask. It would have been desirable to optimize the models with an algorithm from this course, such as Lasso, ridge regression, and elastic search (L1+L2)/2 regularization combined. The way I do this is to run a genetic algorithm over the data, The genetic algorithm is performing its own type of guided sparse sampling of the data.

The Druckman feature analysis protocol originates from MATLAB code associated with the analysis of Blue Brain Project Modelled cells, this feature analysis pipeline was then ported to Python by Justas Birgiolas, at a later point I made the feature analysis pipeline applicable to optimized Adaptive Exponential and Izhiketch cells. Rick Gerkin and Vergil Haynes, assisted in data cleaning preperation and TSNE application.

In [1]:
# !pip install update matplotlib
In [2]:
import plotly.graph_objs as go
import pickle
from sklearn import preprocessing

The highly informative figures below come from Efeatures: https://efel.readthedocs.io/en/latest/eFeatures.html#spike-shape-features

In the figures below you can some different electrical behavior corresponding to two different multi-spiking electrical experiments. spiking.png timing_and_after_spike_depth

The following figure shows the difference in a multispiking waveforms between an Izhikitich model and an adaptive exponential spiking model:

electrical_behavior.png

In [3]:
# THe transformed values, ordered from highest to lowest variance dimensions
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import plotly.express as px
import plotly.graph_objects as go
from sklearn.decomposition import PCA
#![sag.png](sag.png)
import plotly.io as pio
GITHUB = True
if GITHUB:
    pio.renderers.default = "svg"
else:
    pio.renderers.default = 'notebook'

#![passive_properties.png](passive.png)
In [4]:
%%capture

data = []
import warnings
#warnings.filter("ignore")
import pickle
import plotly
import chart_studio.plotly as py
import plotly.offline as py
py.init_notebook_mode(connected=True)
import seaborn
seaborn.set_style("ticks", {"xtick.major.size": 8, "ytick.major.size": 8})
#!conda install -c plotly plotly-orca
useable = pickle.load(open('optimized_multi_feature','rb'))
useable[0].tests
In [5]:
import pickle
import pandas as pd
pd.set_option('precision', 3)
#pd.set_option('max_columns',None)
#pd.set_option('max_rows',None)

hbp = pickle.load(open("hbp_data.p","rb"))
df_o_m3 = None
dict_ = {}

for i in hbp:
    #for lit in list(i.out_dic.items()):
    if i is not None:
        #mod = rekeyeddm(mod)
        for v in list(i.out_dic.values()):
            temp = list(i.out_dic.values())
            if temp[1] is not None:
                dict_.update(temp[1][0])
                #print(temp[0])
        data = pd.DataFrame(data=dict_,index=[temp[0]])
        if i == 0:
            df_o_m3 = data
        else:
            df_o_m3 = pd.concat([data,df_o_m3])
            
del df_o_m3['interburst_voltage']            
In [6]:
df_o_m3 = pd.DataFrame.drop_duplicates(df_o_m3)
bbp_frame = df_o_m3
stable_list = list(bbp_frame.index.values);

df_o_m3;
stable_list;

df = pd.concat([df,bbp_frame])

In [7]:
import os
#import dask.dataframe as dd    
%matplotlib inline
import pickle
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
sns.set(font_scale=1.5)
import pandas as pd
os.getcwd()
from sklearn.preprocessing import StandardScaler
import copy

from neuronunit.optimisation.optimization_management import three_step_protocol, filtered

# Special stuff to import
from sklearn.decomposition import PCA
from sklearn.manifold import Isomap, TSNE

Load optimized reduced cell models

In [8]:
#try:
with open('dm_on_models.p','rb') as f:
    (RAW_dtc,ADEXP_dtc) = pickle.load(f)
    
  

Run the optimized cells through 3 different third party feature extraction routines.

In [9]:
%%capture 
try:
    useable = pickle.load(open('optimized_multi_feature','rb'))

except:
    useable = []

    for value in RAW_dtc.values():
        dtcpop = value
        dtcpop = [ dtc for dtc in dtcpop if type(dtc.rheobase) is not type(None) ]
        useable.extend(list(map(feature_mine,dtcpop)))

    useable = [ dtc for dtc in dtcpop if hasattr(dtc,'allen_30') ]
    pickle.dump(useable,open('optimized_multi_feature','wb'))

Collate 3 different feature sets into a unified and aligned data frame.

For five optimized cells

In [10]:
df_o_m2 = None
dict_ = {}

for i,dtc in enumerate(useable):
    dict_.update(**dtc.everything)
    dict_.update(**dtc.dm_results_regular)
    data = pd.DataFrame([dict_])
    if i == 0:
        df_o_m2 = data
    else:
        df_o_m2 = pd.concat([data,df_o_m2])

df_o_m = df_o_m2
df_o_m;

the blue brain data with the naan column may be what breaks other versions of this notebook.

Its data frame is called df_o_m3

In [11]:
    # A function to convert all cells containing array (or other things) into floats.  
def f(x):
    try:
        return np.mean(x)
    except:
        try:
            return np.mean(x['mean'])
        except:
            print(x)
            raise e

Finally we have a data frame just for optimized cells

Load wrangle and clean data

In [12]:
# Open the 1.5x rheobase file
import os
cwd = os.getcwd()
path2data = os.path.join(cwd,'data')
filename = os.path.join(cwd,'onefive_df.pkl')
with open(filename, 'rb') as f:
    df = pickle.load(f)

    # A function to convert all cells containing array (or other things) into floats.  
def custom_cleaner(x):
    try:
        return np.mean(x)
    except:
        try:
            return np.mean(x['pred'])
        except:
            return np.mean(x['mean'])
df = df.fillna(0).applymap(custom_cleaner)

# Apply this function to each dataframe in order to convert all cells into floats.
# Also call fillna() first to impute missing values with 0's.  
%time df = df.fillna(0).applymap(custom_cleaner)
#df_30x = df_30x.fillna(0).applymap(f)
df.head();
CPU times: user 24.8 s, sys: 170 ms, total: 25 s
Wall time: 24.6 s
In [13]:
def mutate_frame_columns(df_everything,modifyer):
    """
    Adjust data frame column names so they fit into a universal scheme.
    """
    for c in df_everything.columns:
        if str(c)+str('_1.5x') in modifyer.columns:
            df_everything[str(c)+str('_1.5x')] = df_everything[c]
        if str(c)+str('_3.0x') in modifyer.columns:
            df_everything[str(c)+str('_3.0x')] = df_everything[c]
    temp = df_everything
    for c in df_everything.columns:
        if str('_1.5x') not in str(c) and str('_3.0x') not in str(c): #in df_everything.columns:
            temp = temp.drop(columns=[c])
        #if str(c)+str('_3.0x') in df_everything.columns:
        #    df_everything[str(c)+str('_1.5x')] = df_everything[c]
    df_everything = temp
    return df_everything


df_o_m3 = mutate_frame_columns(df_o_m3,df)
In [14]:
df_o_m3.tail(5);
In [15]:
df = pd.concat([df,df_o_m3])
df['AP_fall_indices_1.5x'];
In [16]:
df = df.fillna(0)#.applymap(f)
df = df.applymap(custom_cleaner)
In [17]:
bbp_frame = pd.DataFrame.drop_duplicates(bbp_frame)
In [18]:
df.index[42]
df_everything = df
In [19]:
df_everything = df_everything.fillna(0)
df_everything;
In [20]:
df_everything.tail(15);

Declare a dictionary

Which serves as a table helping us to align Druckman features with high current and low current protocols in other feature sets.

In [21]:
df = df_everything
In [22]:
def remove_reduncy(df,removed):
    for c in removed.columns:
        if c in df.columns:
            del df[c]
    return df

with open('redundancy_removed.p','rb') as f:
    removed = pickle.load(f)
    
df = remove_reduncy(df,removed)

for c in removed.columns:
    try:
        del df[c]
    except:
        pass
In [23]:
df = df.fillna(0).applymap(custom_cleaner)

Finally we have prepared a pandas data frame, where the first half of the data frame, that is the first 448 entries are Druckman measurements pertaining to voltage traces recorded in real biological cells. Appended immediately below in the same data frame we have 965 Druckman measurements derived from NeuroML models. A print out of this frame follows.

In [24]:
df.tail(15);

Identify of the features that explained most variance.

This is a list of the features that explain the most variance.

delete features

which were not available or accessible in the real data. This includes mainly negative current injection protocols such as input resistance etc.

if 'InputResistanceTest_1.5x' in df.columns:

# in order to find out what is seperating and what is not.
del df['InputResistanceTest_1.5x']
del df['InputResistanceTest_3.0x']
del df['ohmic_input_resistance_1.5x']
del df['ohmic_input_resistance_3.0x']
del df['time_1.5x']                              
#       0.190362
del df['decay_time_constant_after_stim_3.0x']
del df['voltage_deflection_3.0x']
del df['steady_state_hyper_3.0x']
del df['steady_state_voltage_stimend_3.0x']
del df['voltage_deflection_vb_ssse_3.0x']
del df['sag_amplitude_3.0x']
#0.198310
del df['is_not_stuck_1.5x']
del df['AHP_depth_abs_1.5x']

"""

In [25]:
df.tail(15);
In [ ]:

In [26]:
# Join the dataframes horizonstally, so that all features coming from df_15x get a '_1.5x' suffix
# and all the ones from df_30x get a '_3.0x' suffix
#df = df_15x.join(df_30x, lsuffix='_1.5x', rsuffix='_3.0x')
def other_cleaner(x):
    if type(x) is type(dict()):
        try:
            return np.mean(x['pred'])
        except:
            return np.mean(x['mean'])

    else:
        return np.mean(x)
   
df = df_everything.fillna(0).applymap(other_cleaner)
df_everything = df
In [27]:
print("There are %d models+data and %d features" % df_everything.shape)
#print("There are %d models+data and %d features" % dask_frame.shape)
There are 7326 models+data and 51 features

The data frame is big.

*lets experiment with dask -lazy pandas array to avoid storing all the data frame in memory all at once.

In [28]:
# Turn all features into Normal(0,1) variables
# Important since features all have different scales
In [29]:
# make model dataframe

model_idx2 = df.head(7).index.tolist()#  list(range(0,9))#idx for idx in df.index.values if type(idx)==str]
model_no_trans_df2 = df[df.index.isin(model_idx2)]
model_no_trans_df2.index.name = 'OptCells'
model_df2 = model_no_trans_df2.copy()
model_df2.index.name = 'OptCells'
#model_df2[:]["type"] = "opt_models"
model_idx = [idx for idx in df.index.values if type(idx)==str]
model_no_trans_df = df[df.index.isin(model_idx)]
model_no_trans_df.index.name = 'Cell_ID'
model_df = model_no_trans_df.copy()
model_df.index.name = 'Cell_ID'

# make experiment dataframe
experiment_idx = [idx for idx in df.index.values if type(idx)==int]
experiment_no_trans_df = df[df.index.isin(experiment_idx)]
experiment_df = experiment_no_trans_df.copy()

model_df2
model_no_trans_df2;
In [30]:
experiment_df = pd.DataFrame.drop_duplicates(experiment_df)
experiment_df.groupby(experiment_df.index).first()
len(experiment_df);
In [31]:
model_df = pd.DataFrame.drop_duplicates(model_df)
model_df.index
model_df.groupby(model_df.index).first()
not_empty = copy.copy(df)
not_empty;

Select out models

that we know are intended to be highly representative of the data. These cells models are by author Gouwens

Primary visual area layer 4 spiny 479728896 Cell https://neuroml-db.org/model_info?model_id=NMLCL001508

In the cells below.

I use PCA as a heuristic aid, to facilitate human intuition about Machine classification of data versus models.

The Dimensionality reduction assists human visual system classification

It is possible for the human visual system to see three clusters of cell data points.

Let's show that we can classify in this low dimensional space (by just using two features). We will slowly build up to classification via first applying Kmeans, to visualize cluster centres. And then move on to using a random forest approach to actually visualizing decision boundaries.

make experiment dataframe

In [ ]:
 
In [ ]:

In [ ]:

In [32]:
def remove_reduncy(df,removed):
    for c in removed.columns:
        if c in df.columns:
            del df[c]
    return df

with open('redundancy_removed.p','rb') as f:
    removed = pickle.load(f)
    
df = remove_reduncy(df,removed)
#print(len(df.columns))

What does variance look like in just the data?

Lets examine Variance in the different Data sources alone.

We can answer the question are electrophyiological measurements in cells from different experimental sources discriminible.

Actually there is good agreement in the variance between Blue Brain Cell Data and Allen Brain Cell data.

Adding in a variety of models could change this story.

In [33]:
df_o_m3.index;
In [34]:
df_o_m3 = df_o_m3.replace([np.inf, -np.inf], np.nan)#.dropna()
experiment_df = experiment_df.replace([np.inf, -np.inf], np.nan)

df_data = pd.concat([experiment_df,df_o_m3])

df1 = copy.copy(df)

for c in df1.columns:
    if np.mean(df1[c])==0 or np.var(df1[c])==0:
        print(c)
        
df.apply(custom_cleaner);
        

seperate models and data by color code -- done

break columns into chunks -- not done

more horizontal space between plots -- not done

logically reorganize

play around with KDE plot smoothing kernel cutoffs get rid of outliers with cutoff -- done

convert 600dpi pdf

bw bandwidth

In [35]:
#model_index_labels
import pickle
In [36]:
'''
pd.set_option('precision', 3)
n_cols = 3
n_rows = int(len(df1.columns)/3)+1
fig, axes = plt.subplots(n_rows,3,figsize=(15, 3*n_rows))
for i, feature in enumerate(df1.columns):
    ax = axes.flat[i]
    ax.set_ylabel(str(feature)[0:10])
    #df1[feature].hist(alpha=0.3, ax=ax)
    n = df1.loc[model_index_labels, feature].notnull().sum()
    sns.kdeplot(df1.loc[model_index_labels, feature], color='r', ax=ax, label=('%d' % n))
    sns.kdeplot(df1.loc[experiment_idx_labels, feature], color='b', ax=ax)
    #ax.legend()
    ax.set_ylabel(feature.replace('_', '\n'))
'''    
    #ax.set_xlabel(units[feature])
#plt.tight_layout()    
Out[36]:
"\npd.set_option('precision', 3)\nn_cols = 3\nn_rows = int(len(df1.columns)/3)+1\nfig, axes = plt.subplots(n_rows,3,figsize=(15, 3*n_rows))\nfor i, feature in enumerate(df1.columns):\n    ax = axes.flat[i]\n    ax.set_ylabel(str(feature)[0:10])\n    #df1[feature].hist(alpha=0.3, ax=ax)\n    n = df1.loc[model_index_labels, feature].notnull().sum()\n    sns.kdeplot(df1.loc[model_index_labels, feature], color='r', ax=ax, label=('%d' % n))\n    sns.kdeplot(df1.loc[experiment_idx_labels, feature], color='b', ax=ax)\n    #ax.legend()\n    ax.set_ylabel(feature.replace('_', '\n'))\n"
In [37]:
'''
n_cols = 3
n_rows = int(len(df1.columns)/3)+1
fig, axes = plt.subplots(n_rows,3,figsize=(15, 3*n_rows))
for i, feature in enumerate(df1.columns):
    ax = axes.flat[i]
    ax.set_ylabel(str(feature)[0:10])
    #df1[feature].hist(alpha=0.3, ax=ax)
    n = df1.loc[model_index_labels, feature].notnull().sum()
    sns.kdeplot(df1.loc[model_index_labels, feature], color='r', ax=ax, label=('%d' % n))
    sns.kdeplot(df1.loc[experiment_idx_labels, feature], color='b', ax=ax)
    #ax.legend()
    ax.set_ylabel(feature.replace('_', '\n'))

fig, axes = plt.subplots(int(np.sqrt(len(df1.columns))), int(np.sqrt(len(df1.columns)))+1,figsize=(15, 15))
for i, feature in enumerate(df1.columns):
    axes.flat[i].set_ylabel(str(feature)[0:10])
    min_ = np.min(df1[feature])
    max_ = np.max(df1[feature])
    axes.flat[i].set_xlim(min_,max_)
    df1[feature].plot(alpha=0.3, ax=axes.flat[i])
plt.tight_layout()    
'''
Out[37]:
"\nn_cols = 3\nn_rows = int(len(df1.columns)/3)+1\nfig, axes = plt.subplots(n_rows,3,figsize=(15, 3*n_rows))\nfor i, feature in enumerate(df1.columns):\n    ax = axes.flat[i]\n    ax.set_ylabel(str(feature)[0:10])\n    #df1[feature].hist(alpha=0.3, ax=ax)\n    n = df1.loc[model_index_labels, feature].notnull().sum()\n    sns.kdeplot(df1.loc[model_index_labels, feature], color='r', ax=ax, label=('%d' % n))\n    sns.kdeplot(df1.loc[experiment_idx_labels, feature], color='b', ax=ax)\n    #ax.legend()\n    ax.set_ylabel(feature.replace('_', '\n'))\n\nfig, axes = plt.subplots(int(np.sqrt(len(df1.columns))), int(np.sqrt(len(df1.columns)))+1,figsize=(15, 15))\nfor i, feature in enumerate(df1.columns):\n    axes.flat[i].set_ylabel(str(feature)[0:10])\n    min_ = np.min(df1[feature])\n    max_ = np.max(df1[feature])\n    axes.flat[i].set_xlim(min_,max_)\n    df1[feature].plot(alpha=0.3, ax=axes.flat[i])\nplt.tight_layout()    \n"
In [38]:
df.columns;
ss = StandardScaler()
df[:] = ss.fit_transform(df.values)
df.groupby(df.index).first()
df = pd.DataFrame.drop_duplicates(df)

df[:] = preprocessing.normalize(df.values, norm='l2')
In [39]:
with open('gouwens.p','rb') as f:
    gouwens = pickle.load(f)
gwn_check = list(gouwens.values())
gwindex = gwn_check[0][0]
gouwens_idx = [idx for idx in df.index.values if idx in gwindex]
gouwens_idx_labels = [i for i in gouwens_idx]
gouwens_idx_labels = df.index.isin(gouwens_idx)
gouwens_cells = df[df.index.isin(gouwens_idx)]

unusual = df[df.index.isin(gouwens_idx)]
unusual
print(gouwens_cells.index[-7],'a Gouwens model that well represents data')
gouwens_cells;
with open('markram.p','rb') as f:
    markram = pickle.load(f)
markram_check = list(markram.values())
markramindex = markram_check[0][0]
markram_idx = [idx for idx in df.index.values if idx in markramindex]
markram_idx_labels = [i for i in markram_idx]
markram_idx_labels = df.index.isin(markram_idx)
markram_cells = df[df.index.isin(markram_idx)]

markram_cells;
experiment_idx = [idx for idx in df.index.values if type(idx)==int]
model_no_trans_df = df[~df.index.isin(experiment_idx)]
experiment_idx_labels = [(i,idx) for i,idx in enumerate(df.index.values) if type(idx)==int]

#model_df
#df.labels
model_no_trans_df
experiment_idx_labels = [i[0] for i in experiment_idx_labels]
experiment_idx_labels
model_no_trans_df
model_index_labels = ~df.index.isin(experiment_idx)

model_index_labels


new_models_idx = df.head(7).index.tolist()#  list(range(0,9))#idx for idx in df.index.values if type(idx)==str]
new_model_labels= df.index.isin(new_models_idx)
#len(new_models_idx)
nm = df.head(7)
nm;
bbp_labels = df.index.isin(stable_list)
bbpindex = df.index.isin(stable_list)
bbpindex = [i for i,idx in enumerate(df.index.values) if idx in stable_list]
bbp = []
for idx in df.index.values:
    if idx in stable_list:
        bbp.append(True)
    else:
        bbp.append(False)
gouwens_idx_labels
bbp_idx = [idx for idx in df.index.values if idx in stable_list]
bbp_idx_labels = [i for i in bbp_idx]
bbp_idx_labels = df.index.isin(bbp_idx_labels)


model_idx;
model_idx = [idx for idx in df.index.values if type(idx)==str and idx not in bbp_idx]

model_index_labels = df.index.isin(model_idx)
experiment_idx_labels = df.index.isin(experiment_idx)
bbp_idx_labels = [idx for idx in df.index.values if type(idx)==str and idx and idx in bbp_idx]


bbp_idx_relabels = [(i,idx) for i,idx in enumerate(df.index.values) if idx in bbp_idx]
bbp_labels = df.index.isin(stable_list)
bbpindex = df.index.isin(stable_list)#.get_loc()

df[bbpindex].index.values;


with open('traub.p','rb') as f:
    traub = pickle.load(f)
traub_check = list(traub.values())
traubindex = traub_check[0][0]

traub_idx = [idx for idx in df.index.values if idx in traubindex]
traub_idx_labels = df.index.isin(traub_idx)
traub_cells = df[df.index.isin(traub_idx)]
traub_cells;
NMLCL001508 a Gouwens model that well represents data
In [40]:
df.columns;
In [41]:
import quantities as qt
qt.dimensionless
Out[41]:
Dimensionless('dimensionless', 1.0 * dimensionless)
In [42]:
shape_index = { 'AP_fall_indices_1.5x':qt.dimensionless,
       'AP_fall_indices_3.0x':qt.dimensionless, 
       'AP_rise_indices_1.5x':qt.dimensionless, 'AP_end_indices_3.0x':qt.dimensionless,
       'fast_trough_index_1.5x':qt.dimensionless, 'fast_trough_index_3.0x':qt.dimensionless,
       'min_AHP_indices_1.5x':qt.dimensionless, 'min_AHP_indices_3.0x':qt.dimensionless,        
       'peak_index_1.5x':qt.dimensionless, 
       'peak_index_3.0x':qt.dimensionless, 
       'peak_indices_1.5x':qt.dimensionless,
       'peak_indices_3.0x':qt.dimensionless,
       'threshold_index_1.5x':qt.dimensionless, 
       'threshold_index_3.0x':qt.dimensionless, 
       'upstroke_index_1.5x':qt.dimensionless, 
       'upstroke_index_3.0x':qt.dimensionless }
        
In [43]:
units = qt.ms 
spike_shape = {'AP1RateOfChangePeakToTroughTest_3.0x':units,  
               'APlast_width_1.5x':units, 
               'initburst_sahp_vb_1.5x':qt.mV,
               'peak_t_1.5x':units, 
               'peak_t_3.0x':units, 
               'peak_time_3.0x':units,
               'sag_ratio1_3.0x':qt.mV/qt.ms, 
               'steady_state_hyper_1.5x':qt.mV,
               'steady_state_voltage_stimend_1.5x':qt.mV,
               'fast_trough_t_1.5x':qt.ms, 
               'fast_trough_t_3.0x':qt.ms, 
               'upstroke_t_1.5x':qt.ms,
               'upstroke_t_3.0x':qt.ms,              
                'voltage_after_stim_3.0x':qt.mV, 
                'voltage_after_stim_1.5x':qt.mV, 
                'steady_state_voltage_1.5x':qt.mV,
                'threshold_t_1.5x':qt.dimensionless,
                'threshold_t_3.0x':qt.dimensionless
              }
    
spike_time = {'AP2DelaySDTest_1.5x':units,
              'AP2DelaySDTest_3.0x':units, 
              'AP_rise_time_1.5x':units,
              'AP_rise_time_3.0x':units, 
              'ISIBurstMeanChangeTest_3.0x':units, 
              'ISICVTest_3.0x':units,
              'ISI_log_slope_skip_3.0x':units,
              'adaptation_index2_3.0x':qt.mV}

units = qt.MOhm 
resistance= {'InputResistanceTest_1.5x':units,
            'InputResistanceTest_3.0x':units ,
            'ohmic_input_resistance_1.5x':qt.MOhm , 
            'ohmic_input_resistance_3.0x':qt.MOhm ,
            }    



               
accomodation = {'AccommodationRateMeanAtSSTest_3.0x':'spikes/2 second', 
                'AccommodationRateToSSTest_3.0x':'spikes/2 second', 'adaptation_index2_3.0x':'spikes/2 second'}
        
rate = {'Spikecount_stimint_1.5x':'spike/2 Second','Spikecount_stimint_3.0x':'spike/2 Second'}

Accomodation

In [44]:
n_cols = 3
n_rows = int(len(accomodation)/3)+1
fig, axes = plt.subplots(1,3,figsize=(35, 15))
for i, (feature,units) in enumerate(accomodation.items()):
    ax = axes.flat[i]

    n = df.loc[model_index_labels, feature].notnull().sum()
    sns.kdeplot(df.loc[model_index_labels, feature], color='r', ax=ax,bw=0.025) # label=('%d' % n)
    sns.kdeplot(df.loc[experiment_idx_labels, feature], color='b', ax=ax,bw=0.025)
    ax.set_ylabel(feature.replace('_', '\n'))
    ax.set_xlabel(units)
plt.tight_layout()
plt.title('Accomodation')

fig.savefig('accomodation.pdf', format='pdf', dpi=600)

Spike Count

In [45]:
n_cols = 3
n_rows = int(len(rate)/3)+1
fig, axes = plt.subplots(1,3,figsize=(35, 15))
for i, (feature,units) in enumerate(rate.items()):
    ax = axes.flat[i]
    #ax.set_ylabel(str(feature)[0:10])
    #df1[feature].hist(alpha=0.3, ax=ax)
    n = df.loc[model_index_labels, feature].notnull().sum()
    sns.kdeplot(df.loc[model_index_labels, feature], color='r', ax=ax)#, label=('%d' % n))
    sns.kdeplot(df.loc[experiment_idx_labels, feature], color='b', ax=ax)
    #ax.legend()
    ax.set_ylabel(feature.replace('_', '\n'))
    ax.set_xlabel(units)
plt.tight_layout()
plt.title('Spike Count')

fig.savefig('spike_count.pdf', format='pdf', dpi=600)
In [ ]:
 

resistance measurements

In [46]:
n_cols = 3
n_rows = int(len(resistance)/3)+1
#fig, axes = plt.subplots(n_cols,n_rows,figsize=(25, 15))
fig, axes = plt.subplots(n_rows,n_cols,figsize=(55, 45))
bw=0.045
for i, (feature,units) in enumerate(resistance.items()):
    ax = axes.flat[i]
    n = df.loc[model_index_labels, feature].notnull().sum()
    
    sns.kdeplot(df.loc[model_index_labels, feature], color='r', ax=ax,bw=bw)
    sns.kdeplot(df.loc[experiment_idx_labels, feature], color='b', ax=ax,bw=bw)
    ax.set_xlabel(units)#+str("_bandwidth_kernel"+str(bw)))
    ax.set_ylabel(feature.replace('_', '\n'))

    ax.legend()
plt.title('Input Resistance')
    
fig.savefig('input_resistance.pdf', format='pdf', dpi=600)
    
#plt.tight_layout()    
    

Spike Shape

In [47]:
n_cols = 3
n_rows = int(len(spike_shape)/3)
plt.clf()
fig, axes = plt.subplots(n_rows,n_cols,figsize=(55, 45))
for i, (feature,units) in enumerate(spike_shape.items()):
    ax = axes.flat[i]
    n = df.loc[model_index_labels, feature].notnull().sum()
    sns.kdeplot(df.loc[model_index_labels, feature], color='r', ax=ax,bw=0.045)
    sns.kdeplot(df.loc[experiment_idx_labels, feature], color='b', ax=ax,bw=0.045)
    ax.set_xlabel(units)
    ax.set_ylabel(feature.replace('_', '\n'))
plt.tight_layout()
plt.title('Spike Time')

fig.savefig('spike_shape_time.pdf', format='pdf', dpi=600)

    #ax.legend()
    
<Figure size 432x288 with 0 Axes>

shape index

In [48]:
n_cols = 3
n_rows = int(len(shape_index)/3)+1
plt.clf()
fig, axes = plt.subplots(n_rows,n_cols,figsize=(55, 45))
for i, (feature,units) in enumerate(shape_index.items()):
    ax = axes.flat[i]
    n = df.loc[model_index_labels, feature].notnull().sum()
    sns.kdeplot(df.loc[model_index_labels, feature], color='r', ax=ax,bw=0.045)
    sns.kdeplot(df.loc[experiment_idx_labels, feature], color='b', ax=ax,bw=0.045)
    ax.set_xlabel(units)
    ax.set_ylabel(feature.replace('_', '\n'))
plt.tight_layout()
plt.title('Relative Time Array Indexs')
fig.savefig('spike_shape_index.pdf', format='pdf', dpi=600)
<Figure size 432x288 with 0 Axes>

In the slide below:

You can see a plot of the high dimensional Druckman feature space projected into a low dimensional space using rotation matrices found via a regular PCA algorithm (not T distributed stochastic neighbourhood embedding).

PCA uses rotated covariance matrices to project original data into the directions where variance in the data is maximum. One disadvantage of this approach is that two of the highest weighted eigenvalues yield synthetic dimensions, synthetic dimensions that are hard to relate back to a just a few of the original Druckman dimensions.

In this way PCA and TSNE are useful data exploration tools, by the may not always lead to a complete understanding of the data.

In order to circumvent this problem we will use the variance-explained feature of "Random Forest" classification algorithm. Random Forest variance explained, will probably hint at which dimensions comprize the greatest eigenvalues/weights of the PCA algorithm.

In [49]:
isomap = Isomap(n_components=2)
isomap.fit(copy.copy(df.values))
iso = isomap.embedding_.T


pca = PCA()

pca.fit(df.values)
n_features = df.shape[1]
transformed = pca.transform(copy.copy(df.values))


# Do PCA and look at variance explained
fig = plt.figure()

plt.plot(range(1,n_features+1),pca.explained_variance_ratio_.cumsum())
plt.xlim(0,50);
plt.xlabel('Number of principal components')
plt.ylabel('Fraction of variance explained');
df;

#iso = iso.T
#iso[0,experiment_idx_labels]
In [50]:
print(np.shape(iso))
(2, 1589)
In [51]:
#iso = iso.T

fig, ax = plt.subplots(figsize=(25, 25))

plt.scatter(iso[0,experiment_idx_labels],iso[1,experiment_idx_labels],c='green',cmap='rainbow',label='data')
plt.scatter(iso[0,model_index_labels],iso[1,model_index_labels],alpha=0.5,c='red',cmap='rainbow',label='models')
plt.scatter(iso[0,new_model_labels],iso[1,new_model_labels],c='orange',cmap='rainbow',label='optimized/revised models')

plt.scatter(iso[0,experiment_idx_labels][42],iso[1,experiment_idx_labels][42],s=700,c='yellow', alpha=0.3,cmap='rainbow',label='Real Data Layer 4 aspiny 313862167')
plt.scatter(iso[0,gouwens_idx_labels][-7],iso[1,gouwens_idx_labels][-7],s=700,c='purple', alpha=0.3,cmap='rainbow',label='Model layer 4 spiny 479728896')
plt.scatter(iso[0,markram_idx_labels],iso[1,markram_idx_labels],s=50,c='cyan',cmap='rainbow',label='Markram models')


plt.scatter(iso[0,gouwens_idx_labels],iso[1,gouwens_idx_labels],c='black',cmap='rainbow',label='Gouwens models')
#plt.scatter(iso[0,regular_iz_idx_labels],iso[1,regular_iz_idx_labels],c='black',cmap='rainbow',label='Gouwens models')
plt.scatter(iso[0,bbp],iso[1,bbp],s=50, c='purple',cmap='rainbow',label='BBP')

#plt.scatter(iso[0,:],iso[1,:],c='green',cmap='rainbow',label='data')

legend = ax.legend()#handles, labels, loc="upper right", title="Sizes")
# I don't love the isomap fit
In [53]:
          
In [54]:
# for k,v in regions.items(): print(k,v.split(',')[0]);

the above process not deeply informative.

It looks like the regions are thalamo cortical (Traub) the regions are Somatosensory (Markram) the labels belong to regions are Gouwens IV (Mus Musculus)

In [55]:
# standard Normalizer

est = KMeans(n_clusters=3)
est.fit(iso.T)
y_kmeans = est.predict(iso.T)
centers = est.cluster_centers_

Another plot but with Kmeans cluster centers included. Showing the cluster centres is a first step towards showing that machine classification on the dimension reduced version of the Druckman data feature space.

In the plot below the two large yellow dots are the cluster centres for (left models), (right data). The Euclidian distnace from each data point from a cluster centre is directly proportional too which category the data point is from (ie model or data, ie red/blue). This visualization would assist us to understand using KMeans nearist neighbours classification algorithm to classify the data.

IN a Random Forest Classification Analysis performed much further below we examine the dimension that contributes the most to cluster seperation by looking at variance explained. This gives us an educated guess about dimensions that contribute the most weight to the axis of the PCA projection spaces plotted above.

It is likely that the axis in the PCA plot below, are strongly aligned with "Input Resistance" in models and data, as well as "AP2RateOfChangePeakToTroughTest". This second dimension means considering multi-spiking waveforms observed in models and data, at the second Action Potential/Spike, how rapid is the decay from peak to trough of the second AP wave.

In [56]:
plt.clf()
fig = plt.figure(figsize=(20,20))
ax = plt.subplot(111)

plt.scatter(iso[0,experiment_idx_labels],iso[1,experiment_idx_labels],c='green',cmap='rainbow',label='Allen Brain Experimental Data')
plt.scatter(iso[0,model_index_labels],iso[1,model_index_labels],c='red',cmap='rainbow',label='models')
plt.scatter(iso[0,new_model_labels],iso[1,new_model_labels],c='orange',cmap='rainbow',label='optimized/revised models')
plt.scatter(iso[0,gouwens_idx_labels],iso[1,gouwens_idx_labels],c='black',cmap='rainbow',label='Gouwens models')
plt.scatter(iso[0,markram_idx_labels],iso[1,markram_idx_labels],c='cyan',cmap='rainbow',label='Markram models')
plt.scatter(iso[0,bbp],iso[1,bbp],s=90, c='purple',cmap='rainbow',label='BBP')
#plt.scatter(iso[0,:],iso[1,:],s=10, c='purple',cmap='rainbow',label='BBP')

plt.scatter(centers[0][0],centers[0][1],s=7000,c='blue', alpha=0.3, edgecolors='blue')#,label='cluster 1')
plt.scatter(centers[1][0],centers[1][1],s=7000,c='blue', alpha=0.3,edgecolors='blue')#,label='cluster 2')
plt.scatter(centers[2][0],centers[2][1],s=7000,c='blue', alpha=0.3,edgecolors='blue')#,label='cluster 3')
#plt.scatter(centers[3][0],centers[3][1],s=7000,c='blue', alpha=0.3,edgecolors='blue')#,label='cluster 3')

legend = ax.legend()#handles, labels, loc="upper right", title="Sizes")
<Figure size 432x288 with 0 Axes>
In [57]:
trace7=([iso[0,bbp]],[iso[1,bbp]],'BBP',bbp)
In [58]:
data = []
trace0=(iso[0,experiment_idx_labels],iso[1,experiment_idx_labels],'Allen Brain Experimental Data',experiment_idx)
trace1=(iso[0,model_index_labels],iso[1,model_index_labels],'models',model_idx)
trace2=(iso[0,new_model_labels],iso[1,new_model_labels],'optimized models',new_models_idx)
trace3=(iso[0,gouwens_idx_labels],iso[1,gouwens_idx_labels],'Gouwens models',gouwens_idx)
trace4=(iso[0,markram_idx_labels],iso[1,markram_idx_labels],'Markram models',markram_idx)

trace5=([iso[0,experiment_idx_labels][42]],[iso[1,experiment_idx_labels]],'Layer 4 aspiny 313862167',experiment_idx_labels)
trace6=([iso[0,gouwens_idx_labels][-7]],[iso[1,gouwens_idx_labels][-7]],'Layer 4 spiny 479728896',gouwens_idx_labels)
#plt.scatter(iso[0,regular_iz_idx_labels],iso[1,regular_iz_idx_labels],c='black',cmap='rainbow',label='Gouwens models')
#trace7=([iso[0,regular_iz_idx_labels]],[iso[1,regular_iz_idx_labels]],'regular_iz_idx_labels',regular_iz_idx_labels)
#trace7=([iso[0,bbp]],[iso[1,bbp]],'BBP',df[bbpindex].index.values)
trace7=(iso[0,bbp],iso[1,bbp],'Blue Brain Project Ephys Data',df[bbpindex].index.values)
trace8=(iso[0,traub_idx_labels],iso[1,traub_idx_labels],'Traub',traub_idx_labels)

#plt.scatter(iso[0,bbp_labels],iso[1,bbp_labels],s=10, c='purple',cmap='rainbow',label='BBP')

traces = [trace0,trace1,trace2,trace3,trace4,trace5,trace6,trace7]#,trace8]
cnt=0
theme = px.colors.diverging.Portland

for i,ttt in enumerate(traces):
    if cnt==len(theme):
        cnt=0
    if i>1:
        size = 6
    else:
        size = 6
        #if type(ttt[3]) is not type(str()):
        
  
    trace = dict(
        type='scatter',
        text = df[df.index.isin(ttt[3])].index,
        x=ttt[0],
        y=ttt[1],
        mode='markers',
        name=ttt[2],
        marker=dict(
            color=theme[cnt],
            size=size,
            line=dict(
                color='rgba(217, 217, 217, 0.14)',
                width=0.5),
            opacity=0.8)
    )
    data.append(trace)
    cnt+=1



fig = go.Figure(
    data=data,
    layout_title_text="steady_state_voltage_stimend_3.0x versus AP1DelayMeanTest_3.0x"
)#,

fig.update_layout(
    autosize=False,
    width=1050,
    height=1050
)


if GITHUB:
    fig.show("svg")
else:
    pio.show(fig)
−6−4−202−4−2024Allen Brain Experimental Datamodelsoptimized modelsGouwens modelsMarkram modelsLayer 4 aspiny 313862167Layer 4 spiny 479728896Blue Brain Project Ephys Datasteady_state_voltage_stimend_3.0x versus AP1DelayMeanTest_3.0x
In [59]:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC

data = []
trace0=(iso[0,experiment_idx_labels],iso[1,experiment_idx_labels],'Allen Brain Experimental Data',experiment_idx)
trace1=(iso[0,model_index_labels],iso[1,model_index_labels],'models',model_idx)
trace2=(iso[0,new_model_labels],iso[1,new_model_labels],'optimized models',new_models_idx)
trace3=(iso[0,gouwens_idx_labels],iso[1,gouwens_idx_labels],'Gouwens models',gouwens_idx)
trace4=(iso[0,markram_idx_labels],iso[1,markram_idx_labels],'Markram models',markram_idx)

trace5=([iso[0,experiment_idx_labels][42]],[iso[1,experiment_idx_labels]],'Layer 4 aspiny 313862167',experiment_idx_labels)
trace6=([iso[0,gouwens_idx_labels][-7]],[iso[1,gouwens_idx_labels][-7]],'Layer 4 spiny 479728896',gouwens_idx_labels)
#plt.scatter(iso[0,regular_iz_idx_labels],iso[1,regular_iz_idx_labels],c='black',cmap='rainbow',label='Gouwens models')
#trace7=([iso[0,regular_iz_idx_labels]],[iso[1,regular_iz_idx_labels]],'regular_iz_idx_labels',regular_iz_idx_labels)
#trace7=([iso[0,bbp]],[iso[1,bbp]],'BBP',df[bbpindex].index.values)
trace7=(iso[0,bbp],iso[1,bbp],'Blue Brain Project Ephys Data',df[bbpindex].index.values)
#trace7=(iso[0,bbp],iso[1,bbp],'Blue Brain Project Ephys Data',df[bbpindex].index.values)

#plt.scatter(iso[0,bbp_labels],iso[1,bbp_labels],s=10, c='purple',cmap='rainbow',label='BBP')

traces = [trace0,trace1,trace2,trace3,trace4,trace5,trace6,trace7,None]

groundtruth = np.array(df.index.isin(experiment_idx))


classif = OneVsRestClassifier(SVC(kernel='linear'))
classif.fit(iso.T, groundtruth)
min_x = np.min(iso[0,:])
max_x = np.max(iso[0,:])
w = classif.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(min_x, max_x)  # make sure the line is long enough
yy = a * xx - (classif.intercept_[0]) / w[1]


cnt=4
theme = px.colors.diverging.Portland

used_columns = df.columns
pal = sns.husl_palette(8, s=.45)
lut = dict(zip(map(str, used_columns), pal))
colors = pd.Series(used_columns, index=df.columns).map(lut)

for i,ttt in enumerate(traces):
    #if cnt==len(theme):
    #    cnt=0
    size = 6
    if i>1:
        pass
        #size = 12
    else:
        size = 6
        text = df[df.index.isin(ttt[3])].index
  

    
    if ttt==None:
        trace = {
          "line": {
            "dash": "solid", 
            "color": "rgb(255,0,0)", 
            "shape": "linear", 
            "width": 2
          }, 
          "mode": "lines", 
          "name": "Decision Boundary", 
          "text": "Decision Boundary", 
          "type": "scatter", 
          "x": xx, 
          "y": yy,
          "yaxis": "y1", 
          "showlegend": False
        }
    else:
        trace = dict(
        type='scatter',
        text=text,
        x=ttt[0],
        y=ttt[1],
        mode='markers',
        name=ttt[2],
        marker=dict(
            color=colors[int(cnt*2.5)],
            size=size,
            line=dict(
                color='rgba(217, 217, 217, 0.14)',
                width=0.5),
            opacity=0.8)
        )
    cnt+=1
    data.append(trace)




fig = go.Figure(
    data=data,
    layout_title_text="steady_state_voltage_stimend_3.0x versus AP1DelayMeanTest_3.0x"
)#,

fig.update_layout(
    autosize=False,
    width=1050,
    height=1050
)

if GITHUB:
    fig.show("svg")
else:
    pio.show(fig)
−6−4−202−4−2024Allen Brain Experimental Datamodelsoptimized modelsGouwens modelsMarkram modelsLayer 4 aspiny 313862167Layer 4 spiny 479728896Blue Brain Project Ephys Datasteady_state_voltage_stimend_3.0x versus AP1DelayMeanTest_3.0x
In [60]:
bbp_idx_labels
text = df[df.index.isin(bbp_idx_labels)].index
text;
In [61]:
# Do a TSNE embedding in two dimensions

tsne = TSNE(n_components=2, perplexity=30)
tsne.fit(df.values)
x = tsne.embedding_.T


plt.clf()
fig = plt.figure(figsize=(20,20))
ax = plt.subplot(111)
plt.scatter(x[0,experiment_idx_labels],x[1,experiment_idx_labels],c='blue',cmap='rainbow',label='data')
plt.scatter(x[0,model_index_labels],x[1,model_index_labels],c='red',cmap='rainbow',label='models')
plt.scatter(x[0,new_model_labels],x[1,new_model_labels],c='green',cmap='rainbow',label='optimized models')
plt.scatter(x[0,gouwens_idx_labels],x[1,gouwens_idx_labels],c='black',cmap='rainbow',label='Gouwens models')
plt.scatter(x[0,markram_idx_labels],x[1,markram_idx_labels],c='orange',s=100, cmap='rainbow',label='Markram models')

plt.scatter(x[0,experiment_idx_labels][42],x[1,experiment_idx_labels][42],s=700,c='purple', alpha=0.3,cmap='rainbow',label='Layer 4 aspiny 313862167')

plt.scatter(x[0,gouwens_idx_labels][-7],x[1,gouwens_idx_labels][-7],s=700,c='purple', alpha=0.3,cmap='rainbow',label=' layer 4 spiny 479728896')
plt.scatter(x[0,bbp],x[1,bbp],s=90, c='purple',cmap='rainbow',label='BBP')

#except:
#    pass
legend = ax.legend()#handles, labels, loc="upper right", title="Sizes")
plt.show()
<Figure size 432x288 with 0 Axes>

We can use cluster grams for two things.

  1. which cells are clustered with other cells?

  2. which features are clustered with other features?

Below is

  1. which cells are clustered with other cells?

Unfortunately this is hard to understand, it may be possible to truncate the dendrogram tree

In [62]:
#sns.clustermap(data=df.values);

Below is:

  1. which features are clustered with other features?

now lets look at how clustered the tests are.

In [63]:
used_columns = df.columns
# Create a categorical palette to identify the networks
pal = sns.husl_palette(8, s=.45)
lut = dict(zip(map(str, used_columns), pal))

# Convert the palette to vectors that will be drawn on the side of the matrix
#networks = df.columns.get_level_values("network")
colors = pd.Series(used_columns, index=df.columns).map(lut)

# Draw the full plot
sns.clustermap(df.corr(), center=0, cmap="vlag",
               row_colors=colors, col_colors=colors,
               linewidths=.75, figsize=(25, 25))
Out[63]:
<seaborn.matrix.ClusterGrid at 0x1c1a8b5dd8>
In [64]:
temp = df.T.corr()
temp;
#sns.clustermap(temp, figsize=(25, 25))
used_columns = df.index
# Create a categorical palette to identify the networks
pal = sns.husl_palette(8, s=.45)
lut = dict(zip(map(str, used_columns), pal))

# Convert the palette to vectors that will be drawn on the side of the matrix
#networks = df.columns.get_level_values("network")
colors = pd.Series(used_columns, index=temp.index).map(lut)

# Draw the full plot
sns.clustermap(temp, center=0, cmap="vlag",
               row_colors=colors, col_colors=colors, figsize=(25,25))
Out[64]:
<seaborn.matrix.ClusterGrid at 0x1c1a9babe0>
In [ ]:

In [65]:
'''
from plotly.figure_factory import create_dendrogram
df_ = pd.DataFrame(temp)
labels=df_.index
labels
#fig = ff.create_dendrogram(X, orientation='left', labels=names)
fig = create_dendrogram(df_,labels=df_.index, orientation='left')
fig.update_layout(width=800, height=8000)

if GITHUB:
    fig.show("svg")
else:
    pio.show(fig)
'''   
Out[65]:
'\nfrom plotly.figure_factory import create_dendrogram\ndf_ = pd.DataFrame(temp)\nlabels=df_.index\nlabels\n#fig = ff.create_dendrogram(X, orientation=\'left\', labels=names)\nfig = create_dendrogram(df_,labels=df_.index, orientation=\'left\')\nfig.update_layout(width=800, height=8000)\n\nif GITHUB:\n    fig.show("svg")\nelse:\n    pio.show(fig)\n'
In [66]:
import plotly.graph_objects as go
import plotly.figure_factory as ff

import numpy as np
from scipy.spatial.distance import pdist, squareform


# get data
#data = np.genfromtxt("http://files.figshare.com/2133304/ExpRawData_E_TABM_84_A_AFFY_44.tab",
#                     names=True,usecols=tuple(range(1,30)),dtype=float, delimiter="\t")
#data_array = data.view((np.float, len(data.dtype.names)))
#data_array = data_array.transpose()
#labels = data.dtype.names
df_ = pd.DataFrame(temp)
data_array = df_.values
labels = df_.index

# Initialize figure by creating upper dendrogram
fig = ff.create_dendrogram(data_array, orientation='bottom', labels=labels)
for i in range(len(fig['data'])):
    fig['data'][i]['yaxis'] = 'y2'

# Create Side Dendrogram
dendro_side = ff.create_dendrogram(data_array, orientation='right')
for i in range(len(dendro_side['data'])):
    dendro_side['data'][i]['xaxis'] = 'x2'

# Add Side Dendrogram Data to Figure
for data in dendro_side['data']:
    fig.add_trace(data)

# Create Heatmap
dendro_leaves = dendro_side['layout']['yaxis']['ticktext']
dendro_leaves = list(map(int, dendro_leaves))
data_dist = pdist(data_array)
heat_data = squareform(data_dist)
heat_data = heat_data[dendro_leaves,:]
heat_data = heat_data[:,dendro_leaves]

heatmap = [
    go.Heatmap(
        x = dendro_leaves,
        y = dendro_leaves,
        z = heat_data,
        colorscale = 'Blues'
    )
]

heatmap[0]['x'] = fig['layout']['xaxis']['tickvals']
heatmap[0]['y'] = dendro_side['layout']['yaxis']['tickvals']

# Add Heatmap Data to Figure
for data in heatmap:
    fig.add_trace(data)

# Edit Layout
fig.update_layout({'width':1800, 'height':1800,
                         'showlegend':False, 'hovermode': 'closest',
                         })
# Edit xaxis
fig.update_layout(xaxis={'domain': [.15, 1],
                                  'mirror': False,
                                  'showgrid': False,
                                  'showline': False,
                                  'zeroline': False,
                                  'ticks':""})
# Edit xaxis2
fig.update_layout(xaxis2={'domain': [0, .15],
                                   'mirror': False,
                                   'showgrid': False,
                                   'showline': False,
                                   'zeroline': False,
                                   'showticklabels': False,
                                   'ticks':""})

# Edit yaxis
fig.update_layout(yaxis={'domain': [0, .85],
                                  'mirror': False,
                                  'showgrid': False,
                                  'showline': False,
                                  'zeroline': False,
                                  'showticklabels': False,
                                  'ticks': ""
                        })
# Edit yaxis2
fig.update_layout(yaxis2={'domain':[.825, .975],
                                   'mirror': False,
                                   'showgrid': False,
                                   'showline': False,
                                   'zeroline': False,
                                   'showticklabels': False,
                                   'ticks':""})

if GITHUB:
    fig.show("svg")
else:
    pio.show(fig)
NMLCL000682NMLCL000699NMLCL000698NMLCL000700NMLCL000685NMLCL000687NMLCL000690NMLCL000696NMLCL000692NMLCL000691NMLCL000695NMLCL000974NMLCL001104NMLCL000945NMLCL000989NMLCL001120NMLCL001093NMLCL001098NMLCL001101NMLCL001115NMLCL000976NMLCL001112NMLCL001097NMLCL001113NMLCL001116NMLCL001007NMLCL000963NMLCL000969NMLCL000971NMLCL000985NMLCL000983NMLCL001109NMLCL000947NMLCL000949NMLCL000957NMLCL000959NMLCL000948NMLCL000990NMLCL000950NMLCL000967NMLCL001117NMLCL000951NMLCL000956NMLCL001105NMLCL001452NMLCL001501NMLCL001484NMLCL001537NMLCL001027NMLCL001065NMLCL001056NMLCL001062NMLCL001033NMLCL001038NMLCL001012NMLCL001037NMLCL000964NMLCL000962NMLCL000975NMLCL000988NMLCL001532NMLCL000993NMLCL000994NMLCL000981NMLCL000995NMLCL001055NMLCL001066NMLCL001001NMLCL001015NMLCL000342NMLCL001069NMLCL001035NMLCL001046NMLCL000425NMLCL000435NMLCL000412NMLCL000413NMLCL000375NMLCL000402NMLCL001435NMLCL001430NMLCL001444NMLCL000710NMLCL000707NMLCL000711NMLCL000719NMLCL000717NMLCL000732NMLCL000749NMLCL000715NMLCL000748NMLCL000709NMLCL000713NMLCL000716NMLCL000202NMLCL000222NMLCL000257NMLCL000258NMLCL000262NMLCL000683NMLCL000684NMLCL000644NMLCL000677NMLCL000646NMLCL000667NMLCL000659NMLCL000668NMLCL000655NMLCL000658NMLCL000637NMLCL000671NMLCL000680NMLCL000670NMLCL000681NMLCL000645NMLCL000653NMLCL000676NMLCL000643NMLCL000673NMLCL000664NMLCL000672NMLCL000259NMLCL000223NMLCL000244NMLCL000211NMLCL000266NMLCL000246NMLCL000253NMLCL000104NMLCL000217NMLCL000238NMLCL000224NMLCL000261NMLCL000242NMLCL000243NMLCL001530NMLCL001535NMLCL001439NMLCL001443NMLCL000430NMLCL000433NMLCL000436NMLCL000444NMLCL000408NMLCL000409NMLCL000431NMLCL000438NMLCL000439NMLCL000449NMLCL000418NMLCL000434NMLCL000417NMLCL000419NMLCL000422NMLCL000424NMLCL000437NMLCL000448NMLCL000722NMLCL000737NMLCL000728NMLCL000723NMLCL000739NMLCL000746NMLCL000750NMLCL000727NMLCL000740NMLCL000724NMLCL000744NMLCL000745NMLCL001488NMLCL001489NMLCL001516NMLCL001463NMLCL001500NMLCL000706NMLCL000729NMLCL000735NMLCL000738NMLCL000210NMLCL000208NMLCL000209NMLCL000218NMLCL000229NMLCL000230NMLCL000250NMLCL000185NMLCL000178NMLCL000154NMLCL000189NMLCL000155NMLCL000187NMLCL000648NMLCL000651NMLCL001449NMLCL000130NMLCL000129NMLCL000105NMLCL000161NMLCL000111NMLCL000144NMLCL000134NMLCL000182NMLCL000151NMLCL000196NMLCL000174NMLCL000173NMLCL000176NMLCL000314NMLCL000311NMLCL000387NMLCL000389NMLCL000338NMLCL000372NMLCL000362NMLCL000366NMLCL000876NMLCL000888NMLCL000908NMLCL000926NMLCL000149NMLCL000179NMLCL000159NMLCL000133NMLCL000157NMLCL000343NMLCL000812NMLCL000852NMLCL000865NMLCL000884NMLCL000853NMLCL000813NMLCL000814NMLCL000831NMLCL000874NMLCL000755NMLCL000883NMLCL000297NMLCL000928NMLCL000364NMLCL000837NMLCL000309NMLCL000310NMLCL000318NMLCL000325NMLCL000348NMLCL000838NMLCL000355NMLCL000405NMLCL000127NMLCL000128NMLCL000165NMLCL000125NMLCL000136NMLCL000153NMLCL000096NMLCL000177NMLCL000190NMLCL000103NMLCL000109NMLCL000093NMLCL000100NMLCL001017NMLCL000283NMLCL000300NMLCL000382NMLCL000823NMLCL000374NMLCL000376NMLCL001064NMLCL001029NMLCL001074NMLCL001075NMLCL001003NMLCL001006NMLCL001078NMLCL001011NMLCL001040NMLCL001058NMLCL001034NMLCL001022NMLCL001014NMLCL001052NMLCL001103NMLCL001042NMLCL001019NMLCL000998NMLCL001032NMLCL001018NMLCL001068NMLCL001010NMLCL000341NMLCL000816NMLCL000942NMLCL001118NMLCL000407NMLCL000938NMLCL000952NMLCL000944NMLCL001094NMLCL000441NMLCL000943NMLCL001106NMLCL000344NMLCL000393NMLCL000317NMLCL000351NMLCL000855NMLCL000863NMLCL000873NMLCL000872NMLCL000913NMLCL000385NMLCL000866NMLCL001005NMLCL001025NMLCL001059NMLCL001047NMLCL001060NMLCL000337NMLCL001004NMLCL000904NMLCL000830NMLCL000841NMLCL000885NMLCL000799NMLCL000815NMLCL000797NMLCL000871NMLCL000918NMLCL000930NMLCL000792NMLCL000880NMLCL000760NMLCL000789NMLCL000826NMLCL000850NMLCL000925NMLCL000824NMLCL000796NMLCL000802NMLCL000835NMLCL000887NMLCL000919NMLCL000891NMLCL000917NMLCL000907NMLCL000929NMLCL000777NMLCL000828NMLCL000791NMLCL000931NMLCL000807NMLCL000847NMLCL001450NMLCL001495NMLCL001526NMLCL001433NMLCL001456NMLCL000702NMLCL000751NMLCL000201NMLCL000199NMLCL000200NMLCL000124NMLCL000703NMLCL000508NMLCL000510NMLCL000460NMLCL000463NMLCL000465NMLCL000458NMLCL000550NMLCL000516NMLCL000535NMLCL000483NMLCL000630NMLCL000457NMLCL000478NMLCL000506NMLCL000479NMLCL000515NMLCL000601NMLCL000555NMLCL000600NMLCL000574NMLCL000552NMLCL000598NMLCL000631NMLCL000587NMLCL000588NMLCL000452NMLCL000591NMLCL000629NMLCL000494NMLCL000553NMLCL000504NMLCL000558NMLCL000295NMLCL000294NMLCL000296NMLCL000121NMLCL000117NMLCL000120NMLCL000513NMLCL000604NMLCL000621NMLCL000583NMLCL000519NMLCL000533NMLCL000521NMLCL000529NMLCL000557NMLCL000615NMLCL000482NMLCL000493NMLCL000602NMLCL000592NMLCL000596NMLCL000623NMLCL000624NMLCL000636NMLCL000500NMLCL000542NMLCL000101NMLCL000520NMLCL000565NMLCL000578NMLCL000518NMLCL000610NMLCL000090NMLCL000091NMLCL000476NMLCL000594NMLCL000603NMLCL000474NMLCL000475NMLCL000512NMLCL000580NMLCL000563NMLCL000489NMLCL000495NMLCL000454NMLCL000618NMLCL000473NMLCL000490NMLCL000139NMLCL000171NMLCL000169NMLCL000168NMLCL000140NMLCL000141NMLCL000167NMLCL000499NMLCL000540NMLCL000302NMLCL000306NMLCL000278NMLCL000287NMLCL000097NMLCL000098NMLCL000471NMLCL000532NMLCL000110NMLCL000113NMLCL000099NMLCL000572NMLCL000560NMLCL000620NMLCL000857NMLCL000470NMLCL000567NMLCL000496NMLCL000543NMLCL000544NMLCL000285NMLCL000274NMLCL000359NMLCL000381NMLCL000378NMLCL000361NMLCL000377NMLCL000536NMLCL000809NMLCL000756NMLCL000752NMLCL000763NMLCL000313NMLCL000858NMLCL000860NMLCL000392NMLCL000315NMLCL000327NMLCL000273NMLCL000933NMLCL000934NMLCL000305NMLCL000307NMLCL000308NMLCL000399NMLCL000400NMLCL000281NMLCL000289NMLCL000280NMLCL000354NMLCL000279NMLCL000893NMLCL000894NMLCL000333NMLCL000336NMLCL000332NMLCL000334NMLCL000459NMLCL000818NMLCL000819NMLCL000821NMLCL000528NMLCL000584NMLCL000304NMLCL000328NMLCL000632NMLCL000562NMLCL000634NMLCL000526NMLCL000607NMLCL000608NMLCL000568NMLCL000612NMLCL000619NMLCL000626NMLCL000491NMLCL000617NMLCL000564NMLCL000622NMLCL000768NMLCL000783NMLCL000786NMLCL000497NMLCL000501NMLCL000566NMLCL000593NMLCL000611NMLCL000290NMLCL000772NMLCL000775NMLCL000559NMLCL000825NMLCL000569NMLCL000766NMLCL000890NMLCL000767NMLCL000779NMLCL000352NMLCL000754NMLCL000758NMLCL000769NMLCL001020NMLCL000757NMLCL000927NMLCL000761NMLCL000935NMLCL000899NMLCL000901NMLCL000770NMLCL000846NMLCL001447NMLCL001083NMLCL001084NMLCL001002NMLCL001087NMLCL001088NMLCL001090NMLCL001091NMLCL001476NMLCL001477NMLCL001504NMLCL001542NMLCL001470NMLCL001521NMLCL001498NMLCL001506NMLCL001442NMLCL001508NMLCL000326NMLCL001514NMLCL000869NMLCL000803NMLCL000805NMLCL001140NMLCL001127NMLCL001594NMLCL000255NMLCL000194NMLCL000661NMLCL000221NMLCL000234NMLCL000640NMLCL001454NMLCL000363NMLCL000426NMLCL001124NMLCL001125NMLCL000180NMLCL001123560738452588699048501895840519220630581157315526936091561519381505804754505804754505804754505804754555345752565866518562432852562432852585951863585951863569270348585805211596609577581058351581058351561814552593618144580895033583403073583403073503254105584670484585841870511247944585947309585947309566761793566719275573404307579672464580537822541536216541536216541536216541536216593436526589760138589760138565120091565120091545193484586088125513514828528706755529822280570924514505689242530097064525011903525011903525011903525011903516916566529988200518271679589452470589452470559221786584872371579652000508309649563350008579981789571306508579662784579662784562185938509003464523729022522696638569477234502100978580537822580537822564410881578485753527463935519220630519220630581340781502267531502267531502267531502267531503595332530062268562630201560748672560748672560748672560748672560599530560678143513879001545604343574036994503595332503595332503595332569073509569810649569810649501730497502367941502367941502367941502367941539563223539563223539563223539563223507101551528638057504615116508421103585740198609775336609775336524693113569958754508400049570098593570098593526785742586073850586073850519220630526939383515986607584271064528706755528706755564438523514824979504615116504615116580537822515435668528617665558583839505847801508887004504615116526732858513800575588711875515305346580161872522723828528706755565871856527095729566563603527874431593448793510715606522418855560686477580145037528038990563226105569749430571314481571314481609435731569457730586358539564346637566466395569998790571715883329554301323777372328535908483020137479728896485889190470098860470098860470098860470098860322536988469864405483092310485905411486029942489754156486560376325479788325479788325479788325479788323838579487445455320836481369697038484744672480011594466705337324065524480351780314822529486502127320654829486235288585080288585080288475585413478396248476266853488689403490263438476754635478793814313861539474283475480169202482773968488679042397243949482516631485053941475549334485182007320247631466114829473543792475580568323452196475550148484744867471088062486791945490382353482690728490746462488491756488491756488491756488491756484629749475895240475895240475895240475895240485184849484742372484742372318553131483111429486755781321704734488438177481093525320466776469801138481127932481127932481127932481127932396963717313862167313862167313862167313862167478795872475549284475549284475549284475549284320455763323540736482773968469753383477278032490283114439372920328031983323834998489923501476099282439372920476615310439372920439372920422055317469798159482773968480169202488679042482773968480169202488679042488681087490376252481099785478716304475622680490912243480169202488679042321906005488428837488428837488428837488428837464079491478586425486176465486776965488688374322781822479225294490733962324521027471767045486111903486893033586071425322762730328015342481003643NMLCL001459NMLCL001493NMLCL001534NMLCL001481NMLCL001541501736631476104386476104386476104386476104386NMLCL001457NMLCL001475NMLCL001527509683388583426723524074295528638213NMLCL001650NMLCL001438NMLCL001512NMLCL001492NMLCL000398NMLCL000370NMLCL000401NMLCL001441NMLCL001482NMLCL001432NMLCL001445NMLCL001513NMLCL001468NMLCL001533NMLCL000322NMLCL000347NMLCL000241NMLCL000264NMLCL000122NMLCL000657518750800NMLCL001660NMLCL000148NMLCL001138480087928480087928480087928480087928476178907486025194479993900479993900479993900479993900480777144482645063326737743325480372486198953491026389483099796396903166468120757322246422479091820479091820479091820479091820479721491486262299486262299486262299486262299489885173478447600486196663486196663486196663486196663367567889397353539485577686471758398580632959484564503324466858476135066476135066476135066476135066476093220482735425477880128327970859327970859327970859327970859471129934322197295482726749477127614477127614477127614477127614484515166484515166484515166484515166313861677479492633479492633479492633479492633469734859476053392487446992486108147488680211318733871490205998486146717370351753490267468476269122476269122476269122476269122322723785478500297480187546480122859485468180329554392320207387490718897464326158486508983486508983486508983486508983482493761473909767475623964475623964475623964475623964488674910490149337328954223329224678329224678329224678329224678481127348478947222475894121475894121475894121475894121485245025480735016487176969487176969487176969487176969488708322480714960480714960480714960480714960476051620482764620482764620482764620571314481571314481313862167313862167313862167313862167313862167313862167313862167313862167313862167313862167313862167313862167482764620571314481571314481A88NMLCL000346A13A84476104386476104386476104386476104386571314481571314481581403121529807751NMLCL001455NMLCL001428NMLCL001472NMLCL001540505500191566724483529908238529908238529908238529908238502614426571292875571292875584280397502999078518238899539540199513803975548408551515949222526785799555354254566720165501839495504567071547262585502887600502245101517974394517974394517974394517974394488386626580014328596619013527891102527891102527891102527891102527942865531523709520470657548473051537307503531520401531520401531520401531520401575642695585753841562062132566671538566671538488386431571255047517673596582644380583822407NMLCL000507NMLCL000509NMLCL000462NMLCL000464NMLCL000590NMLCL000628NMLCL000484NMLCL000556NMLCL000599NMLCL000477NMLCL000782NMLCL000784NMLCL000480NMLCL000541NMLCL000514NMLCL000582NMLCL000488NMLCL000585NMLCL000455NMLCL000456NMLCL000461NMLCL000531NMLCL000613NMLCL000335NMLCL000771NMLCL000774NMLCL000778NMLCL000820NMLCL000282NMLCL000781NMLCL000293NMLCL000537NMLCL000595NMLCL000468NMLCL000503NMLCL000546NMLCL000616NMLCL000522NMLCL000523NMLCL000525NMLCL000469NMLCL000472NMLCL000087NMLCL000089NMLCL000118NMLCL000119NMLCL000330NMLCL000485NMLCL000547NMLCL000492NMLCL000625NMLCL000586NMLCL000605NMLCL000606NMLCL000524NMLCL000548NMLCL000138NMLCL000170NMLCL000545NMLCL000614NMLCL000371NMLCL000397NMLCL000272NMLCL000320NMLCL000331NMLCL000895NMLCL000275NMLCL000277NMLCL000765NMLCL000801NMLCL000898NMLCL000762NMLCL000798NMLCL000804NMLCL000806NMLCL000879NMLCL000922NMLCL000291NMLCL000940NMLCL000859NMLCL000776NMLCL000822NMLCL001085NMLCL001086NMLCL001082NMLCL001089NMLCL000197NMLCL000213NMLCL000357NMLCL000380NMLCL000379NMLCL000358NMLCL000360NMLCL000303NMLCL000329NMLCL000112NMLCL000892NMLCL000932NMLCL000486NMLCL000861572410577528042185623427407592952680592952680515188639517345160545206633508282493527931217502193385520636424515511110528671590501951290501951290501951290501951290513510227513510227513510227513510227502383543513593674513593674513593674513593674502993826506133241565888394584561230565871768565871768NMLCL001522NMLCL001434NMLCL001483NMLCL001507NMLCL000517NMLCL000834NMLCL001051NMLCL000794NMLCL000911NMLCL000800NMLCL000827NMLCL000916NMLCL000836NMLCL000921NMLCL000795NMLCL001041NMLCL000773NMLCL000811NMLCL000889NMLCL000862NMLCL000793NMLCL000840NMLCL001510NMLCL000939NMLCL001036NMLCL000705NMLCL000704NMLCL000720NMLCL000162NMLCL000150NMLCL000116NMLCL000576NMLCL000095NMLCL000126NMLCL000146NMLCL000115NMLCL000142NMLCL000106NMLCL000108NMLCL000181NMLCL000324NMLCL000373NMLCL000298NMLCL000353NMLCL000395NMLCL000312NMLCL000345NMLCL000301NMLCL000299NMLCL000390NMLCL001024NMLCL000910NMLCL000845NMLCL000881NMLCL000854NMLCL000867NMLCL000829NMLCL000868NMLCL000977NMLCL000978NMLCL001099NMLCL001114NMLCL000410NMLCL000979NMLCL001102NMLCL001111NMLCL001026NMLCL001067NMLCL000966NMLCL000972NMLCL001023NMLCL001054NMLCL001079NMLCL001061NMLCL000997NMLCL001000NMLCL000842NMLCL001076NMLCL000844NMLCL000909NMLCL000839NMLCL000843NMLCL000903NMLCL000902NMLCL000905NMLCL000875NMLCL000906NMLCL000886NMLCL000920NMLCL000205NMLCL000226NMLCL000268NMLCL000132NMLCL000163NMLCL000267NMLCL000225NMLCL000245NMLCL000708NMLCL000718NMLCL000733NMLCL000741NMLCL000712NMLCL000734NMLCL000736NMLCL000714NMLCL000747NMLCL000652NMLCL000647NMLCL000656NMLCL000678NMLCL000131NMLCL000107NMLCL000186NMLCL000247NMLCL000231NMLCL000249NMLCL000160NMLCL000164NMLCL000191NMLCL000188NMLCL000184NMLCL000193NMLCL000396NMLCL000404NMLCL000391NMLCL000276NMLCL000388NMLCL000914NMLCL000394NMLCL000882NMLCL000339NMLCL001016NMLCL001071NMLCL000365NMLCL000878NMLCL001049NMLCL000999NMLCL001009NMLCL001048NMLCL001028NMLCL001053NMLCL001013NMLCL001030NMLCL000638NMLCL000912NMLCL000254NMLCL000232NMLCL000236NMLCL000639NMLCL000641NMLCL000662NMLCL000650NMLCL000654NMLCL000694NMLCL000669NMLCL000679NMLCL000642NMLCL000660NMLCL000697NMLCL000686NMLCL000701NMLCL000156NMLCL000674NMLCL000675NMLCL000260NMLCL000206NMLCL000237NMLCL000263NMLCL000269NMLCL000256NMLCL000265NMLCL000073NMLCL000742NMLCL000725NMLCL000743A89NMLCL001440NMLCL001136NMLCL001142NMLCL001141NMLCL000954NMLCL001095NMLCL001119NMLCL001108NMLCL000984NMLCL000982NMLCL000986NMLCL001092NMLCL000955NMLCL000946NMLCL000953NMLCL000968NMLCL000958NMLCL000960NMLCL000961NMLCL000970NMLCL000429NMLCL000411NMLCL000420NMLCL000415NMLCL000451NMLCL000414NMLCL000450NMLCL001039NMLCL001031NMLCL001073NMLCL001043NMLCL001044NMLCL001448NMLCL000443NMLCL000446NMLCL001524NMLCL001478NMLCL001497NMLCL001525NMLCL001539NMLCL001467NMLCL001519B4B77B28B50B76A66A22A51B82B9B8B23A15B93B22A80B27NMLCL001538B24NMLCL001461A77B94B95A91B18A82NMLCL001469B90A86B47NMLCL001515A100A85B15B39B19A53B78NMLCL001139NMLCL001531NMLCL001437NMLCL001458NMLCL001657A90NMLCL000123A87NMLCL000102NMLCL000219010203040
In [67]:
#fig.show()
In [68]:
'''
from plotly.figure_factory import create_dendrogram
df_ = pd.DataFrame(iso.T)
fig = create_dendrogram(df_)

if GITHUB:
    fig.show("svg")
else:
    pio.show(fig)
'''
Out[68]:
'\nfrom plotly.figure_factory import create_dendrogram\ndf_ = pd.DataFrame(iso.T)\nfig = create_dendrogram(df_)\n\nif GITHUB:\n    fig.show("svg")\nelse:\n    pio.show(fig)\n'
In [69]:
#df[bbp];
In [70]:
import plotly.express as px
import plotly.graph_objects as go

data = []
trace0=(x[0,experiment_idx_labels],x[1,experiment_idx_labels],'Allen Brain Ephys Data',experiment_idx)
trace1=(x[0,model_index_labels],x[1,model_index_labels],'models',model_idx)
trace2=(x[0,new_model_labels],x[1,new_model_labels],'optimized models',new_models_idx)
trace3=(x[0,gouwens_idx_labels],x[1,gouwens_idx_labels],'Gouwens models',gouwens_idx)
trace4=(x[0,markram_idx_labels],x[1,markram_idx_labels],'Markram models',markram_idx)
trace7=(x[0,bbp],x[1,bbp],'Blue Brain Project Ephys Data',bbp_idx_labels)

trace5=([x[0,experiment_idx_labels][42]],[x[1,experiment_idx_labels]],'Layer 4 aspiny 313862167',experiment_idx_labels)
trace6=([x[0,gouwens_idx_labels][-7]],[x[1,gouwens_idx_labels][-7]],'Layer 4 spiny 479728896',gouwens_idx_labels)

# trace7=([iso[0,gouwens_idx_labels][-7]],[iso[1,gouwens_idx_labels][-7]],'Layer 4 spiny 479728896',gouwens_idx_labels)
#plt.scatter(iso[0,bbp_labels],iso[1,bbp_labels],s=10, c='purple',cmap='rainbow',label='BBP')
trace7=(x[0,bbp],x[1,bbp],'Blue Brain Project Ephys Data',df[bbpindex].index.values)

classif = OneVsRestClassifier(SVC(kernel='linear'))
classif.fit(x.T, groundtruth)
min_x = np.min(x[0, :])
max_x = np.max(x[0, :])
w = classif.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(min_x, max_x)  # make sure the line is long enough
yy = a * xx - (classif.intercept_[0]) / w[1]



used_columns = df.columns
pal = sns.husl_palette(8, s=.45)
lut = dict(zip(map(str, used_columns), pal))
colors = pd.Series(used_columns, index=df.columns).map(lut)


traces = [trace0,trace1,trace2,trace3,trace4,trace5,trace6,trace7,None]
cnt=4
theme = px.colors.diverging.Portland
for i,ttt in enumerate(traces):
    if cnt==len(theme):
        cnt=0
    if i>1 and i!=7:
        size = 12
    else:
        size = 6
        
    if i==8:
        trace = {
          "line": {
            "dash": "solid", 
            "color": "rgb(255,0,0)", 
            "shape": "linear", 
            "width": 2
          }, 
          "mode": "lines", 
          "name": "Decision Boundary", 
          "text": "Decision Boundary", 
          "type": "scatter", 
          "x": xx, 
          "y": yy,
          "yaxis": "y1", 
          "showlegend": False
        }
    else:
        trace = dict(
            type='scatter',
            text = df[df.index.isin(ttt[3])].index,
            x=ttt[0],
            y=ttt[1],
            mode='markers',
            name=ttt[2],
            marker=dict(
                color=theme[cnt],
                size=size,
                line=dict(
                    color='rgba(217, 217, 217, 0.14)',
                    width=0.5),
                opacity=0.8)
        )
    data.append(trace)
    cnt+=1


layout = go.Layout(yaxis=dict(range=[-50, 50]))
fig = go.Figure(
    data=data,
    layout_title_text="steady_state_voltage_stimend_3.0x versus AP1DelayMeanTest_3.0x", layout=layout
)

#layout = {'scene': {'xaxis': {'showspikes': False}}}
fig.update_layout(
    autosize=False,
    width=1050,
    height=1050
)

if GITHUB:
    fig.show("svg")
else:
    pio.show(fig)
−40−2002040−40−2002040Allen Brain Ephys Datamodelsoptimized modelsGouwens modelsMarkram modelsLayer 4 aspiny 313862167Layer 4 spiny 479728896Blue Brain Project Ephys Datasteady_state_voltage_stimend_3.0x versus AP1DelayMeanTest_3.0x

t-SNE

The TSNE plot does a better job of spatially sperating experimental data from theoretical models in dimension reduced Druckman feature space.

In [71]:
new_model_labels
Out[71]:
array([ True,  True,  True, ..., False, False, False])

*Finally we examine the dimension that contributes the most to cluster seperation by looking at variance explained. This gives us an educated guess about dimensions that contribute the most weight to the axis of the PCA projection spaces plotted above.

Note to self, experimental_index needs updtating to include BBP cells.

In [72]:
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

df_models = df[~df.index.isin(experiment_idx)]
df_data = df[df.index.isin(experiment_idx)]
# Assume they have the same columns

df_combined = pd.concat([df_data, df_models])

groundtruth = np.array(df.index.isin(experiment_idx))
rfc = RandomForestClassifier()
X = df_combined.values
rfc.fit(X, groundtruth)

importances = pd.Series(index = df_combined.columns, data=rfc.feature_importances_)
groundtruth[-9:-1]

print(importances.sort_values(ascending=False)[0:9])
ohmic_input_resistance_1.5x             0.217
InputResistanceTest_3.0x                0.185
ohmic_input_resistance_3.0x             0.138
InputResistanceTest_1.5x                0.113
AP1RateOfChangePeakToTroughTest_3.0x    0.066
fast_trough_index_3.0x                  0.047
ISIBurstMeanChangeTest_3.0x             0.042
threshold_index_3.0x                    0.041
ISICVTest_3.0x                          0.040
dtype: float64
In [73]:
'''
try:
    df = df.drop(columns = ['ohmic_input_resistance_3.0x'])
    df = df.drop(columns = ['ohmic_input_resistance_1.5x'])
    df = df.drop(columns = ['InputResistanceTest_1.5x'])
    df = df.drop(columns = ['InputResistanceTest_3.0x'])
except:
    pass


df = df.drop(columns = ['ohmic_input_resistance_1.5x'])
'''

#ohmic_input_resistance_3.0x             0.387
#sag_ratio1_3.0x                         0.151
#ohmic_input_resistance_1.5x             0.104
#AP1RateOfChangePeakToTroughTest_3.0x    0.070
#InputResistanceTest_3.0x                0.054

df_models = df[~df.index.isin(experiment_idx)]
df_data = df[df.index.isin(experiment_idx)]
# Assume they have the same columns

df_combined = pd.concat([df_data, df_models])

groundtruth = np.array(df.index.isin(experiment_idx))
rfc = RandomForestClassifier()
X = df_combined.values
rfc.fit(X, groundtruth)

importances = pd.Series(index = df_combined.columns, data=rfc.feature_importances_)
groundtruth[-9:-1]

print(importances.sort_values(ascending=False)[0:9])
sag_ratio1_3.0x                0.218
InputResistanceTest_3.0x       0.206
ohmic_input_resistance_3.0x    0.131
InputResistanceTest_1.5x       0.104
ohmic_input_resistance_1.5x    0.086
fast_trough_index_3.0x         0.075
peak_index_3.0x                0.060
AP_rise_time_1.5x              0.044
AP2DelaySDTest_1.5x            0.009
dtype: float64

make four regions for ground truth

for a region dependent classification with four regions (0,1,2,3)

Gouwens and Allen 0 Markram and BBP 1 Hippocampus 2 Thalamocortical 3

In [74]:
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

df_models = df[~df.index.isin(experiment_idx)]
df_data = df[df.index.isin(experiment_idx)]
# Assume they have the same columns

df_combined = pd.concat([df_data, df_models])

groundtruth = np.array(df.index.isin(experiment_idx))
rfc = RandomForestClassifier()
X = df_combined.values
rfc.fit(X, groundtruth)

importances = pd.Series(index = df_combined.columns, data=rfc.feature_importances_)
groundtruth[-9:-1]

print(importances.sort_values(ascending=False)[0:9])
InputResistanceTest_3.0x                0.191
AP1RateOfChangePeakToTroughTest_3.0x    0.138
ohmic_input_resistance_1.5x             0.119
fast_trough_index_3.0x                  0.088
sag_ratio1_3.0x                         0.075
InputResistanceTest_1.5x                0.063
threshold_index_3.0x                    0.050
ISIBurstMeanChangeTest_3.0x             0.048
peak_index_3.0x                         0.048
dtype: float64
In [75]:
print('InputResistanceTest_3.0x' in df_data.columns)
print('InputResistanceTest_3.0x' in df_models.columns)
print('InputResistanceTest_3.0x' in df_combined)
True
True
True
In [108]:
fig, ax = plt.subplots(figsize=(25, 25))


ax = sns.kdeplot(df['AP1RateOfChangePeakToTroughTest_3.0x'], df['sag_ratio1_3.0x'],
                 cmap="Blues", shade=True,bw=.15)


#plt.show()
In [106]:
ax = sns.kdeplot(df_models['AP1RateOfChangePeakToTroughTest_3.0x'], df_models['sag_ratio1_3.0x'],
                 cmap="Blues", shade=True,bw=.15)
In [109]:
ax = sns.kdeplot(df_data['AP1RateOfChangePeakToTroughTest_3.0x'], df_data['sag_ratio1_3.0x'],
                 cmap="Reds", shade=False,bw=.15)
In [76]:
plt.clf()
plt.scatter(df_data['InputResistanceTest_1.5x'], df_data['ohmic_input_resistance_3.0x'],label='experimental data')
plt.scatter(df_models['InputResistanceTest_1.5x'], df_models['ohmic_input_resistance_3.0x'],label='models')

plt.xlabel('InputResistanceTest_1.5x')
plt.ylabel('ohmic_input_resistance_3.0x')
plt.legend()
plt.show()
In [77]:
plt.clf()
plt.scatter(df_data['sag_ratio1_3.0x'], df_data['ohmic_input_resistance_3.0x'],label='experimental data')
plt.scatter(df_models['sag_ratio1_3.0x'], df_models['ohmic_input_resistance_3.0x'],label='models')

plt.xlabel('sag_ratio1_3.0x')
plt.ylabel('ohmic_input_resistance_3.0x')
plt.legend()
plt.show()
In [79]:
#from sklearn import preprocessing
#df_data_b = df[bbp]
#df_data_b[:] = preprocessing.normalize(df_data_b.values, norm='l2')
#experiment_df[:] = preprocessing.normalize(experiment_df.values, norm='l2')
#temp_bbpindex = [i for i,idx in enumerate(df_data.index.values) if idx in stable_list]
#temp_bbpindex
'''
df_data = pd.concat([experiment_df,df_data_b])
#temp_allenindex = list(range(len(temp_bbpindex),len(df_data)))
plt.clf()


plt.scatter(df_data['sag_ratio1_3.0x'], df_data['AP1RateOfChangePeakToTroughTest_3.0x'],label='BBP experimental Cells')
plt.scatter(experiment_df['sag_ratio1_3.0x'], experiment_df['AP1RateOfChangePeakToTroughTest_3.0x'],label='Allen experimental Cells')
#plt.scatter(df_o_m['steady_state_voltage_stimend_3.0x'], df_o_m['AP1DelayMeanTest_3.0x'],label='models')

plt.xlabel('sag_ratio1_3.0x')
plt.ylabel('AP1RateOfChangePeakToTroughTest_3.0x')
plt.legend()
plt.show()
'''
Out[79]:
"\ndf_data = pd.concat([experiment_df,df_data_b])\n#temp_allenindex = list(range(len(temp_bbpindex),len(df_data)))\nplt.clf()\n\n\nplt.scatter(df_data['sag_ratio1_3.0x'], df_data['AP1RateOfChangePeakToTroughTest_3.0x'],label='BBP experimental Cells')\nplt.scatter(experiment_df['sag_ratio1_3.0x'], experiment_df['AP1RateOfChangePeakToTroughTest_3.0x'],label='Allen experimental Cells')\n#plt.scatter(df_o_m['steady_state_voltage_stimend_3.0x'], df_o_m['AP1DelayMeanTest_3.0x'],label='models')\n\nplt.xlabel('sag_ratio1_3.0x')\nplt.ylabel('AP1RateOfChangePeakToTroughTest_3.0x')\nplt.legend()\nplt.show()\n"
In [80]:
def crawl_ids(url):
    ''' move to aibs '''
    all_data = requests.get(url)
    all_data = json.loads(all_data.text)
    Model_IDs = []
    for d in all_data:
        Model_ID = str(d['Model_ID'])
        Model_IDs.append(Model_ID)
    return Model_IDs

import glob 
import pickle
import json 
import requests
#import get_three_feature_sets_from_nml_db as runnable_nml
#from get_three_feature_sets_from_nml_db import analyze_models_from_cache
#from neuronunit.get_three_feature_sets_from_nml_db import crawl_ids
def download_intensives():


    list_to_get =[ str('https://neuroml-db.org/api/search?q=traub'),
        str('https://neuroml-db.org/api/search?q=markram'),
        str('https://neuroml-db.org/api/search?q=Gouwens') ]
    regions = {}
    for url in list_to_get:
        Model_IDs = crawl_ids(url)
        for Model_ID in Model_IDs:
            url = str("https://neuroml-db.org/api/model?id=")+Model_ID
            try:            
                model_contents = requests.get(url)
                model_contents = json.loads(model_contents.text)
                regions[Model_ID] = model_contents['keywords'][0]['Other_Keyword_term']

            except:
                pass

            #print(regions)
            with open('regions.p','wb') as f:
                pickle.dump(regions,f)
    return regions
                
#regions = download_intensives()      
In [96]:
 
In [82]:
# SQL output is imported as a dataframe variable called 'df'
#import pandas as pd
#import plotly.plotly as py
import plotly.graph_objs as go

# 3D scatter plot. Resource: https://plot.ly/python/3d-scatter-plots/
theme = px.colors.diverging.Portland

experimental_data = go.Scatter3d(
    x = df_data['sag_ratio1_3.0x'],
    y = df_data['AP1RateOfChangePeakToTroughTest_3.0x'],
    z = df_data['ohmic_input_resistance_3.0x'],
    mode ='markers',
    name = 'experimental data',
    marker =dict(
      color = theme[0],
      size = 3,
      opacity = 0.9
    )
)




"""

opt_models = go.Scatter3d(
    x = df_o_m3['AP_begin_voltage_1.5x'],
    y = df_o_m3['AP1AHPDepthTest_1.5x'],
    z = df_o_m3['decay_time_constant_after_stim_3.0x'],
    mode ='markers',
    name = 'bad Model',
    marker =dict(
      color = theme[1],
      size = 3,
      opacity = 0.9
    )
)
"""


models = go.Scatter3d(
    x = experiment_df['sag_ratio1_3.0x'],
    y = experiment_df['AP1RateOfChangePeakToTroughTest_3.0x'],
    z = experiment_df['ohmic_input_resistance_3.0x'],
    mode ='markers',
    name = 'Models',
    marker =dict(
      color = theme[1],
      size = 3,
      opacity = 0.9
    )
)

data = [experimental_data , models]
layout = go.Layout(
   scene = dict(xaxis = dict(title='AP_begin_voltage_1.5x'),
                yaxis = dict(title='AP1AHPDepthTest_1.5x'),
                zaxis = dict(title='decay_time_constant_after_stim_3.0x'),),
    margin=dict(
        l=10,
        r=10,
        b=10,
        t=10
    )
)
fig = go.Figure(data=data, layout=layout)
df_o_m3
# Use Periscope to visualize a dataframe, text, or an image by passing data to periscope.table(), periscope.text(), or periscope.image() respectively.
#py.show(fig)
#py.plot(fig, filename='data_only_3D_scatter.html') 


if GITHUB:
    fig.show("svg")
else:
    fig.show()
experimental dataModels

As I wrote above Random Forest Variance explained, tells us the dimensions of the Druckman feature space that most strongly assist in classifying models versus data. When we identify features that seperate models and data using Variance Explained, we are then able to iteratively variables that contribute more heavily to data variance. We can remove variables that explain most variance, until machine classification can no longer tell models and data apart, leaving us with a small list of tests which models need to perform better on, these tests correspond to measurable electrical properties of cells that need to be better aligned with data.

Two such measurable electrical properties are Druckman features with high variance explained are re: "Input Resistance" in models and data, as well as "AP2RateOfChangePeakToTroughTest". This second dimension called AP2RateOfChangePeakToTroughTest means: when considering multi-spiking waveforms observed, at the second Action Potential/Spike, how rapid is the average decay from peak to trough of the second AP wave. Since Action Potential wave attack and decay shapes are non-linear, the instantaneous gradient from the peak of the wave is not informative, and it is more useful to measure the time interval needed needed for a decay from a spike, to a state of hyperpolarization, corresponding to a neurons "refractory-period".

Already, we are have arrived at useful information, pertaining to the point of the exercise, as we now have a small list of electrical tests, that we want optimized models to perform better on, such that models and data will be more aligned with each other.

As neural modelers with a great interest in mimicing a diverse range of experimental data using models. The least convincing aspects of our models as mimics of data, are these top ten features. In other words the least convincing aspects of our models are: AP2RateOfChangePeakToTroughTest, Input Resistance values (a scalar), and the amplitude of the first and second spike.

Prediction Results When I Use Random Forests on all 51 feaature dimensions

  • remember that our ground truth labels are booleans that are defined like this: groundtruth = np.array(df_combined.index.isin(experiment_idx)) Which is labeled as "True" for this data point is an experiment, and "False" for this data point is a model. Machine classification can successfuly discrimate that our optimized cells are models and not data (this is bad news for us).

In this context in order to bolster out optimized models, a high false-negative rate is desirable. Unfortunately for us, that is not what we see. The Random Forest Classifier (RFC) correctly identies that all 11 of the new optimized cells are not derived from experiments (they are models). That is bad news for us.

In [83]:
df_o_m.columns;
In [84]:
rfc = RandomForestClassifier()
X = df_combined.values
X;
In [85]:
df_o_m.values;

> 99%

of models are classified as models when optimized models are not included.

If the random forest is trained on the optimized models too Then the area under the ROC curve becomes closer to 1.

When we feed in the 7 new optimized models as "validation data", in this context, the RFC is still okay at classifying our new models correctly as models, and not data. However, the RFC performance is significanlty worse, as the optimized cells have tricked the RFC 4 times.

Also since the output of the TSNE-PCA varies with each run, as it is seeded with a psuedo random numnber generator, the projection space that the RFC acts on is different each time. Meaning that the FPR and the TPR rates vary slightly on each run.

In the plot below we show the we show the decision boundary as used by our classifier.

The decision boundary llows us to see if the newer optimized models are classified as data or models

In [86]:
import plotly.express as px
import plotly.graph_objects as go
  • Blue means experimental data, red means model, as machine categorized using Sklearns Random Forest classifier.

A general trend is apparent at the macro scale. It seems as if the bottom 2/3rds of the plane belong to data, and the remaining upper 1/3rd of the plane belongs to models, however you can also see on a micro scale there are lots of small pockets, or islands of model decision territory inside, what is more generally regarded as data territory.

Although the large red dots, appear in the correct side of the macro decision boundary, zooming in would reveal that these optimized models are in fact enveloped by model island that is excatly small enough to contain them. Therefore random forest classification, correctly classifies the optimized models as models.

The above figure allows us

To argue that the newer optimized models are closer to cluster centres and often fall on the experimental data side of the decision boundary.

Lets switch back to using all 38 features to classify

Only as its easier for me to debug, and I can make progress more quickly. Using cross validation below you can see that this approach is generalizable.

In [87]:
from sklearn import metrics
In [88]:
#!pip install --update sklearn
len(groundtruth)
groundtruth
df_combined.head()
Out[88]:
AP1RateOfChangePeakToTroughTest_3.0x AP2DelaySDTest_1.5x AP2DelaySDTest_3.0x AP_end_indices_3.0x AP_fall_indices_1.5x AP_fall_indices_3.0x AP_rise_indices_1.5x AP_rise_time_1.5x AP_rise_time_3.0x APlast_width_1.5x ... threshold_index_1.5x threshold_index_3.0x threshold_t_1.5x threshold_t_3.0x upstroke_index_1.5x upstroke_index_3.0x upstroke_t_1.5x upstroke_t_3.0x voltage_after_stim_1.5x voltage_after_stim_3.0x
313861539 0.009 0.307 0.307 0.087 0.113 0.087 0.113 -0.009 -0.004 -1.239e-03 ... 0.202 0.176 0.110 0.084 0.202 0.176 0.110 0.084 0.103 0.100
313861677 0.023 -0.061 -0.061 -0.208 -0.078 -0.207 -0.077 -0.017 -0.003 -3.281e-03 ... 0.254 0.187 -0.026 -0.122 0.254 0.187 -0.026 -0.122 0.007 -0.047
313862167 0.013 -0.034 -0.034 0.119 0.125 0.119 0.125 -0.012 -0.005 -1.837e-03 ... 0.268 0.254 0.128 0.116 0.268 0.254 0.128 0.116 -0.111 -0.133
313862167 0.013 -0.034 -0.034 0.119 0.125 0.119 0.125 -0.012 -0.005 -1.837e-03 ... 0.268 0.254 0.128 0.116 0.268 0.254 0.128 0.116 -0.111 -0.133
313862167 0.006 -0.016 -0.016 -0.318 0.059 -0.316 0.059 -0.006 -0.006 -8.722e-04 ... 0.127 0.121 0.061 0.055 0.127 0.121 0.061 0.055 -0.053 0.537

5 rows × 51 columns

In [89]:
from sklearn.model_selection import train_test_split
for i in range(0,10):
    X_train, X_test, y_train, y_test = train_test_split(df_combined.values, groundtruth, test_size=0.5, random_state=42)    

rfc = RandomForestClassifier()

All 7 seven new models are classified as data

Ie it is false that they are percieved as models, its true that they are percieved as data.

This will enable us to use a cross-validation Approach.

Cross-validation will help us to check the generalizability of our model, by better navigating the bias-variance tradeoff.

Report on misclassification.

Even though RFC can be over-fit by using all the data over 148 features a False negative is still sometimes reported. Experimental data point with identifier: "482764620" is sometimes falsely classified as a model, but we know from ground truth that it is an experiment.

The outputs of thistest are a bit different each time.

Repeat above with just experimental data

In [90]:
# make model dataframe
model_idx = [idx for idx in df.index.values if type(idx)==str]
model_no_trans_df = df[df.index.isin(model_idx)]
model_no_trans_df.index.name = 'Cell_ID'
model_df = model_no_trans_df.copy()
model_df.index.name = 'Cell_ID'

# make experiment dataframe
experiment_idx = [idx for idx in df.index.values if type(idx)==int]
experiment_no_trans_df = df[df.index.isin(experiment_idx)]
experiment_df = experiment_no_trans_df.copy()
experiment_df;
In [91]:
model_df[:] = ss.fit_transform(model_no_trans_df.values);
In [92]:
model_no_trans_df.head();
In [93]:
model_df.head();
In [94]:
# PLay around with the value of perplexity. Recommended values are between 5 and 50.  
#See if any of the clusters that pop out mean anything (do they match existing cell type clusters?  
# Do they match known firnig patterns?  
# Where does the data fit in when it is also plotted in this space?)

perplexities = [25,30,35,40]

df = model_df.copy()

fig, ax = plt.subplots(2,2)
ax = ax.ravel()

for i, perp in enumerate(perplexities):

    # init = 'pca' or 'random'
    tsne = TSNE(n_components=2,
                init='random',      
                random_state=0,
                perplexity=perp,         # default = 30, should be less than the number of samples
                n_iter=1000)             # default = 1000


    %time tsne.fit(df.values) # can't use transpose


    ax[i].scatter(*tsne.embedding_.T);

    if i in [2,3]:
        ax[i].set_xlabel('TSNE-1')

    if i in [0,2]:
        ax[i].set_ylabel('TSNE-2')
    ax[i].set_title('Perplexity = %s' %perp)
CPU times: user 17.3 s, sys: 2.88 s, total: 20.2 s
Wall time: 20.3 s
CPU times: user 19.3 s, sys: 2.87 s, total: 22.2 s
Wall time: 22.3 s
CPU times: user 20.5 s, sys: 2.59 s, total: 23.1 s
Wall time: 23.2 s
CPU times: user 23.2 s, sys: 2.76 s, total: 26 s
Wall time: 26.1 s
In [95]:
try:
    os.mkdir('data')
except:
    pass
filename = os.path.join(path2data,'new_cortical_ephys.csv')

model_df.to_csv(filename)

filename = os.path.join(path2data,'new_cortical_ephys_no_trans.csv')
model_no_trans_df.to_csv(filename)

filename = os.path.join(path2data,'experiment_ephys_no_trans.csv')
experiment_no_trans_df.to_csv(filename)
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: